Neural Networks
=Artificial Neural Networks= An Artificial Neural Network (ANN, often called simply NN in computer science) has been invented by McCullock and Pitts in 1943, with a mathematical model of simplified neurons. ANN is a computational mode that shows interesting proprieties, similar to the human's brain Proprieties * Learning: they are able to learn and improve their output quality with experience. * Flexibility (fault tolerant): given a partially-correct input, they can give the correct solution. * Complexity: with growth of the number of neurons, the network become really complex. * Distributed: the computation is distributed through each node of the network, that behaves by itself and tries to minimize the output error in cooperation with other nodes. You can imagine a NN as a cluster of interconnected nodes that, with experience, balances the workload on each node and tries to converge to a solution. The solution is actually a function: it's proven that with a neural network we can simulate any function. Structure The structure of the network can change with the context of the application, but we can identify some constrain. The base node, for example, is a mathematical representation of a biological neuron, and it's composed by: * Inputs: connection with other neurons or directly to "sensors" (i.e. input values from the world). * Output: the value given by the unit after the computation (i.e. the value of the neuron when it's actived). * Activation function: the function that describes the cell's behavior. * Connections: the contact between neurons (called synapse in biological model). In both BNN and ANN the neuron produces an output that is a transformed weighted sum of it's input. The transformation is actually done by the activation function: : y = f(\sum_{i=0}^n (x_i \cdot w_i)) The output y is the activation function f applied to the sum of the i-th input multiplied to the i-th weight. Biological Model In the biological model (Biological Neural Network), the activation function is the heaviside: there is not a "soft" output function as it exists in the ANN. When sufficient number of positive or negative ions comes to the dendrites (i.e. inputs), an instability in the neuron is generated and the unit spread the signals (i.e. output) to other neurons. It's obvious that the input and output values are Volts, that is real values. *Nucleus: the central concentrated essence of a neuron. (Contains DNA.) *Dendrite: input sensors of neurons. *Synapse: connections between neurons Mathematical Model In the mathematical model we can use different activation functions and input values, not only real values as the BNN, but also binary and integer values. In addition, we can use "soft" activation functions, like sigmoids and semi-linear functions. Topologies As computer networks, also neural network can be combined with different topologies with different proprieties. We can define some main classes: * Single-strate: when we use a single line of parallel cells to filter the inputs. For example, we can use single strate networks to reproduce logical ports AND, OR and NOT (but not XOR). * Dual-strate: we have input cells connected with output cells (sometimes this is referred also to single-strate). Cells of the same strate aren't connected toghether, but a single cell can be connected to many cells of the other strate. * Multi-strate: we introduce "hidden layers" between input and output strates. A strate is connected only with the previous and the next. * Mesh/combination of the above: obviously NN can have irregular structures, like a combination of the above, like jumps between non-adiacent strates, like a mesh structure (each node connected with each other). It seems that neural networks tend to evolve gaining a mesh structure, but it's known that in the human brain there are some network main blocks, almost indipendent from each other. Functions Different kind of activation functions does exists. Common ones are the HS (heaviside) and the sigmoid. : HS(x) = \begin{cases} 0, x < 0 \\ 1, x \geq 0 \\ \end{cases} : P(x) = \frac{1}{1 + e^{-x}} Plane subdivision Neural networks are often seen as classifier, or plane partitioner. This mean that if you put in a plane the points of your solution, the network to work properly should be able to discriminate these points. I bring you as an example the logic ports: a single unit can be sufficient to discriminate points in the plane of a AND, OR and NOT functions, but it's not sufficient for a XOR. As probabily you noticed, there is a constant input -1, with a certain weight. Usually this is set to actually translate the activation function. This weight is called w_0 and it's the one that actually determine the activation of the function when the weighted sum of the other inputs pass this fixed weight. As you probabily noticed, a single neuron can be viewed like a line that divide the plane in 2 half. Using more complex network the space partitioning can be more effective. For example, to represent the XOR function we can use a dual strate network. So you can imagine the network as shapes that divide the space to discriminate the solutions. With the complexity of the network grows the complexity of the shapes and the dimensions of the space. Training A neural network should be trained before be used correctly. When we train a network we talk about "training epoch". We can train the network by providing some sets of input/output combinations. These are called "training sets". An epoch is computed as follow: # For each I/O set ## Give the input ## Compute the output ## Compute the error between given output and the expected output ## Modify the weights between units Usually you want to calculate the global error for all examples and go thru epoch until you get an acceptable error. The distribution of the training set is important for a good result: if they are homogenously distributed, the training will be effective. For example, if you train a network for learning the addition function, you should provide examples like: 4+5=9, 5+4=9, 1+8=9 and 8+1=9. As we said, the training is made by modifiying the weights of the connections. This is made by some algorythims like the Delta Rule, and it's based on doing steps of the weights trying to minimize the error. A good approach would be to move the weights following the derivate of the error curve to find the minimum error point: if the error derivate is positive, then the error grows if we move to the left, so we move to the right. And viceversa. Network categories Usually we can subdivide the ANN in 3 categories, based on their contextual function. Associative memories In this case we use NN to associate some inputs to some outputs. The advantages is the fault-tolerancy of the memory. This category is used for example in image pattern recognition: given an input similar to the right one, the network still produce the correct output. Function simulator We told early that neural networks can be used to compute functions. We can train a network to behave like a function that we don't know how to compute, but we know how to behave (for example knowing it's graph). Classifiers NN are also use to discriminate inputs in certain categories. For example we can use a network to classify some geometric figures on a plane. Topologies Feedforward Signal is transmitted only toward one or more forward layers. No connection to back or same layer are present. In this case the computation is a functional transformation of the inputs. Feedback (recursive) Signal is transmitted to one or more layer, both forward and backward (and even within the same layer of neurons). In this case we can have more complex propagation of the signal, allowing limited memory capabilities. For example you can see how storing is possible by building a SR-Latch with 2 neurons configured as NOR ports. Programatic Implementation There are commonly two ways to implement a neural network: one is to work on arrays of values and one is based on the concept of neurons. Usually the first approach is more a procedural one, while the concept of neuron is adapt for Object Orientation. = Resources = Joey Rogers, Object-Oriented Neural Networks in C++. San Francisco: Morgan Kaufmann, 1996. External Tutorials And Articles * AI Junkie - Neural Networks In Plain English * generation5 - Articles About Neural Networks * comp.ai.neural-nets - Neural Networks Newsgroup * NN Course at Willamette University * Introduction to neural networks * NN at statsoft.com Category: Artificial intelligence